Supplementary Material of Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks
نویسندگان
چکیده
Before the proof, we detail the assumptions needed for Theorem 1. For pSGLD, its associated Stochastic Differential Equation (SDE) has an invariant measure ρ(θ), the posterior average is defined as: φ̄ , ∫ X φ(θ)ρ(θ)dθ for some test function φ(θ) of interest. Given samples (θt)t=1 from pSGLD, we use the sample average φ̂ to approximate φ̄. In the analysis, we define a functional ψ that solves the following Poisson Equation:
منابع مشابه
Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks
Effective training of deep neural networks suffers from two main issues. The first is that the parameter spaces of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient Descent (SGD). These methods improve convergence by adapting to the local geometry of parameter space. A second issue is overfitting, which is ...
متن کاملNatural Langevin Dynamics for Neural Networks
One way to avoid overfitting in machine learning is to use model parameters distributed according to a Bayesian posterior given the data, rather than the maximum likelihood estimator. Stochastic gradient Langevin dynamics (SGLD) is one algorithm to approximate such Bayesian posteriors for large models and datasets. SGLD is a standard stochastic gradient descent to which is added a controlled am...
متن کاملPreconditioned Stochastic Gradient Descent
Stochastic gradient descent (SGD) still is the workhorse for many practical problems. However, it converges slow, and can be difficult to tune. It is possible to precondition SGD to accelerate its convergence remarkably. But many attempts in this direction either aim at solving specialized problems, or result in significantly more complicated methods than SGD. This paper proposes a new method t...
متن کاملLearning to Sample Using Stein Discrepancy
We propose a simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference. Our method is based on iteratively adjusting the neural network parameters so that the output changes along a Stein variational gradient [1] that maximumly decreases the KL divergence with the target distribution. Our method works for any target distribu...
متن کاملThe Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent
Understanding the generalization of deep learning has raised lots of concerns recently, where the learning algorithms play an important role in generalization performance, such as stochastic gradient descent (SGD). Along this line, we particularly study the anisotropic noise introduced by SGD, and investigate its importance for the generalization in deep neural networks. Through a thorough empi...
متن کامل